AITopics | case study

Collaborating Authors

case study

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FRESH: Information-Geometric Calibration of Patient-Level Models to Aggregate Evidence

Fuller, Franklin, Bertolini, Daniele, Liang, Samantha, Christopher, Jason, Smith, Aaron M.

arXiv.org Machine LearningMay-18-2026

Many decision in clinical science and epidemiology -- estimating probability of technical success for a clinical trial, assessing comparative effectiveness of two therapies, imputing a placebo effect onto natural history data -- rely on combining sources of information about a clinical cohort that comes from different kinds of studies. Specifically we contrast patient-level sources that provide granular pictures of individual disease course (clinical trial, registries, or electronic health records) with aggregate sources such as published clinical trial results and the TFLs (tables figures and listings). One strategy for combining aggregate with patient-level data sources is to bring each into a common format for a unified analysis. If one wants to maintain the analytic flexibility of patient-level data, then a natural solution is to convert the aggregate data information into a simulated patient-level dataset that recapitulate those aggregate statistics. This is an under-determined inverse problem in that there are many such datasets, and it cannot be well specified without further constraints. FRESH(Fusion of Recent Evidence with Subject Histories) provides a well-defined method for solving this problem, and therefore providing maximal analytic flexibility.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

2605.16246

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks Supplementary Materials

Neural Information Processing SystemsApr-30-2026, 03:53:44 GMT

The source code of Minigrid and Miniworld can be found at https://github.com/ To run the experiments, we have implemented the following functionalities: 1. implemented the human trajectory saving for MiniGrid-FourRooms-v0 (copied the ManualControlclass from Minigrid and added 38 lines of code, which are mostly calling data saving functions); 2. implemented the human trajectory saving for MiniWorld-FourRooms-v0 (copied the ManualControlclass from Miniworld and added 45 lines of code, which are mostly calling data saving functions); 3. implemented data saving and plotting for MiniGrid-FourRooms-v0 (33 lines of code, mostly for Matplotlib); 4. implemented data saving and plotting for MiniWorld-FourRooms-v0 (33 lines of code, mostly for Matplotlib). In total, the implementation of this new functionality required 149 lines of code. The source code is hosted on GitHub. We bear all the responsibility in case of violation of rights.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.17)

Industry: Education (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

8 max

Neural Information Processing SystemsApr-25-2026, 10:12:16 GMT

We proceed to show the sparsistency510 of the estimated parameters. First, suppose that Θ t;ij 6= 0 for some time tand index (i,j). Due to 0 < γ < 1, the above inequality implies that bΘt;ij = 0521 for every t and (i,j) 6 St, and bΘt;ij bΘt 1;ij = 0 for every t > 0 and (i,j) 6 Dt. The proof is inspired527 by Corollary 1 in [47]. First, we present the following key lemmas.528

artificial intelligence, precision matrix, runtime, (17 more...)

Neural Information Processing Systems

Industry: Banking & Finance > Trading (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Add feedback

13b45b44e26c353c64cba9529bf4724f-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 17:29:32 GMT

artificial intelligence, machine learning, node, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Neural Information Processing SystemsMar-22-2026, 16:04:43 GMT

Despite the remarkable success of transformer-based models in various real-world tasks, their underlying mechanisms remain poorly understood. Recent studies have suggested that transformers can implement gradient descent as an in-context learner for linear regression problems and have developed various theoretical analyses accordingly. However, these works mostly focus on the expressive power of transformers by designing specific parameter constructions, lacking a comprehensive understanding of their inherent working mechanisms post-training. In this study, we consider a sparse linear regression problem and investigate how a trained multi-head transformer performs in-context learning. We experimentally discover that the utilization of multi-heads exhibits different patterns across layers: multiple heads are utilized and essential in the first layer, while usually only a single head is sufficient for subsequent layers. We provide a theoretical explanation for this observation: the first layer preprocesses the context data, and the following layers execute simple optimization steps based on the preprocessed context. Moreover, we demonstrate that such a preprocess-then-optimize algorithm can significantly outperform naive gradient descent and ridge regression algorithms. Further experimental results support our explanations. Our findings offer insights into the benefits of multi-head attention and contribute to understanding the more intricate mechanisms hidden within trained transformers.

artificial intelligence, machine learning, transformer utilize multi-head attention, (11 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

HYDRA-FL: Hybrid Knowledge Distillation for Robust and Accurate Federated Learning

Neural Information Processing SystemsMar-20-2026, 19:10:58 GMT

Data heterogeneity among Federated Learning (FL) users poses a significant challenge, resulting in reduced global model performance. The community has designed various techniques to tackle this issue, among which Knowledge Distillation (KD)-based techniques are common. While these techniques effectively improve performance under high heterogeneity, they inadvertently cause higher accuracy degradation under model poisoning attacks (known as \emph{attack amplification}). This paper presents a case study to reveal this critical vulnerability in KD-based FL systems. We show why KD causes this issue through empirical evidence and use it as motivation to design a hybrid distillation technique. We introduce a novel algorithm, Hybrid Knowledge Distillation for Robust and Accurate FL (HYDRA-FL), which reduces the impact of attacks in attack scenarios by offloading some of the KD loss to a shallow layer via an auxiliary classifier. We model HYDRA-FL as a generic framework and adapt it to two KD-based FL algorithms, FedNTD and MOON. Using these two as case studies, we demonstrate that our technique outperforms baselines in attack settings while maintaining comparable performance in benign settings.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

Li, Jia Ming, Anupriya, null, Graham, Daniel J.

arXiv.org Machine LearningMar-18-2026

Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and a local certification radius, derived from the decoder Jacobian and a Lipschitz bound, quantifies the geometric robustness of efficiency scores. We validate GeMA on synthetic data with non-convex frontiers, heterogeneous technologies and scale bias, and on four real-world case studies: global urban rail systems (COMET), British rail operators (ORR), national economies (Penn World Table) and a high-frequency wind-farm dataset. Across these domains GeMA behaves comparably to established methods when classical assumptions hold, and provides additional insight in settings with pronounced heterogeneity, non-convexity or size-related bias.

artificial intelligence, inputlog, machine learning, (17 more...)

arXiv.org Machine Learning

2603.16729

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Rail (1.00)
Energy > Renewable (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Neural Information Processing SystemsMar-17-2026, 18:55:07 GMT

Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node.

artificial intelligence, decentralized algorithm outperform centralized algorithm, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

13b45b44e26c353c64cba9529bf4724f-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 22:32:50 GMT

artificial intelligence, machine learning, node, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

case study

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

FRESH: Information-Geometric Calibration of Patient-Level Models to Aggregate Evidence

f0318ba897cee71ce200e408dea6062e-Supplemental-Conference.pdf

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks Supplementary Materials

8 max

13b45b44e26c353c64cba9529bf4724f-Supplemental-Conference.pdf

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

HYDRA-FL: Hybrid Knowledge Distillation for Robust and Accurate Federated Learning

GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

13b45b44e26c353c64cba9529bf4724f-Supplemental-Conference.pdf